Abstract
Background: Up to 10% of US children with sickle cell disease (SCD) develop abnormal transcranial Doppler (TCD) velocities, indicating abnormal cerebral vasculature and a high risk of stroke. Chronic red cell transfusions decrease stroke risk in patients with abnormal TCD. However, transfusions are resource-intensive and burdensome to patients, and they may not fully reverse existing damage. Prevention of TCD abnormalities is therefore ideal. The impact of real-world SCD care practices on TCD results is largely unknown due to an inability to detect TCD outcomes without labor-intensive manual review. We present a novel method for identification and interpretation of TCD results across multiple children's hospitals.
Methods: We aimed to develop accurate, high throughput interpretation for all TCD procedures between 2009 – 2025 across 2 children's hospitals, the Children's Hospital of Philadelphia and Nemours Children's Hospital. Chart review demonstrated significant heterogeneity in clinician interpretation of similar velocity values; therefore, TCD classification was determined based on individual vessel velocities.
We developed a classification algorithm combining a large language model (LLM) to identify velocities with threshold-based interpretation based on STOP criteria. We used Llama 4 Scout, an open-source generative model, with prompts engineered to extract time-averaged maximum mean velocities. Our pipeline included 3 sequential stages: (1) LLM prompting to extract velocities from TCD reports, (2) removal of physiologically implausible velocities, and (3) application of STOP criteria. To account for variable clinical documentation, we stratified reports by year and site during prompt development. We iteratively refined our prompt until optimal accuracy was obtained. Imaging vs. non-imaging TCD was determined based on institutional practice and classification thresholds were adjusted per the ASH 2020 guidelines.
Velocities from 100 reports per center (20% abnormal TCD, 20% conditional TCD, 20% indeterminate TCD, and 40% normal TCD) were manually validated. Additionally, using automated methods, algorithm-generated TCD results were compared to 4,164 TCD reports previously manually classified by a hematologist. Performance metrics including sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for each TCD result category.
Results: The study included 6,802 TCD reports. The algorithm distinguished abnormal vs. not-abnormal TCD with near-perfect performance: 100% sensitivity, 99% specificity, PPV of 92%, and NPV of 100%. Results were similar for classification of other outcomes, with an average sensitivity of 92%, specificity of 97.5%, PPV of 94%, and NPV of 98%.
Of 1,582 patients with TCD reports, 1,273 (80.3%) had only normal TCD, 220 (13.9%) had ≥ 1 conditional TCD but no abnormal results, and 72 (1.76%) had ≥ 1 abnormal TCD, aligning with expected results. Median age at first abnormal TCD was 5 (IQR: 4 - 9) years. Of patients with ≥1 conditional TCD, 33 (15%) subsequently developed abnormal TCD.
Discussion: Our NLP-based algorithm accurately classifies TCD results across 2 children's health systems, despite both intra-institution and inter-institution variability in TCD methods and report formatting. Our work is the first to automatically interpret TCD results across multiple institutions. These novel methods will facilitate the real-world analysis of a critical SCD outcome and enable comparative effectiveness study of variable clinical practices. Ongoing work includes TCD analysis across 3 additional sites, clinical studies incorporating TCD results, and development of additional automated tools for improved SCD outcome ascertainment.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal